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Abstract — We seek to characterize the estimation performance 
of a sensor network where the individual sensors exhibit the 
phenomenon of drift, i.e., a gradual change of the bias. Though 
estimation in the presence of random errors has been extensively 
studied in the literature, the loss of estimation performance 
due to systematic errors like drift have rarely been looked 
into. In this paper, we derive closed-form Fisher Information 
matrix and subsequently Cramer-Rao bounds (upto reasonable 
approximation) for the estimation accuracy of drift-corrupted 
signals. We assume a polynomial time-series as the representative 
signal and an autoregressive process model for the drift. When 
the Markov parameter for drift p < 1, we show that the first- 
order effect of drift is asymptotically equivalent to scaling the 
measurement noise by an appropriate factor. For p = 1, i.e., 
when the drift is non-stationary, we show that the constant 
part of a signal can only be estimated inconsistently (non- 
zero asymptotic variance). Practical usage of the results are 
demonstrated through the analysis of 1) networks with multiple 
sensors and 2) bandwidth limited networks communicating only 
quantized observations. 

Index Terms — Sensor Networks, Systematic Errors, Autore- 
gressive Process, Polynomial Regression, Distributed Estimation 



I. Introduction 

A. The phenomenon of drift 

Sensor networks often consist of a number of sensors 
deployed for specific tasks such as detection and estimation 
of the ambient phenomenon [1|. Sensing nodes are equipped 
with sensing, computation, and inference capabilities. Sensor 
networks can potentially deploy a large number of sensors, 
can measure the environment over time and generate data, 
from which we seek to extract relevant information regard- 
ing the phenomenon. Often while assessing the estimation 
performance when using such a network, sensor inaccuracies 
are modeled as independent measurement errors, e.g., i.i.d. 
Gaussian errors were assumed in ||2l.||3ll. However, it has 
been widely reported that a variety of sensors used in various 
applications exhibit systematic errors or drift, e.g., Ground 
Moving Target Indicator radar sensors [4|, tilt sensors in bridge 
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monitoring applications [5|, CO2 sensors used for air quality 
monitoring [6] and salinity sensors for ocean monitoring 0. 
For example, gas sensors are known to drift due to other 
environmental parameters like temperature and humidity JSJ. 

B. Inference with drifting sensors 

When sensors in a network develop systematic errors with 
time, inferences based on these observations become increas- 
ingly inaccurate. For example, a scheme where each sensor can 
track their own drift in a collaborative manner is presented in 
|9| and such a system was shown to accumulate error over 
time. This necessitates periodic calibration/ registration of the 
sensors, a procedure that increases the operating cost for the 
network. Though the issue of drift is often acknowledged in 
the literature, performance analysis of a network consisting 
of drifting sensors has not been done before. Our goal in this 
paper is to characterize the estimation performance of a sensor 
network in terms of the drift properties of the constituent 
sensors. 

We consider multiple sensors deployed in a field of interest 
to monitor a particular environment. The sensors observe the 
same phenomena over n — 1,2, ... ,N time instants and their 
observations are corrupted by noise and drift. The noisy obser- 
vations are relayed to a common sink, where the objective is 
to estimate the parameters governing the observed phenomena. 
Let z n ^ m denote the noisy observation of the 771 th sensor at the 
71 th instant and z m denote the vector [z l m , z 2 , m, ■ ■ ■ , z N,m]'- 
We assume a linear observation model of the form 

z m = X[3 + e m , e m ~ A/"(0, £ m ) (1) 

where x = Xf3 and e m are iV x 1 vectors denoting 
the signal and error terms. Here X is assumed to be a 
iVx (P+l) known matrix (describing the temporal-dynamics) 
and f3 is the (P+l) x 1 vector denoting an unknown but 
deterministic signal. The error term e m consists of noise 
and drift components (to be described later) and is assumed 
to be Gaussian distributed with covariance matrix S m . The 
observations z m are communicated to a sink, whose job is to 
obtain an accurate estimate of f3. From estimation theory (see, 
for example, lITOll and ifTTI ). it is well known that the variance 
of an unbiased estimator is lower bounded by the Cramer-Rao 
Lower Bound (CRB) and that the Maximum-Likelihood (ML) 
estimator asymptotically (for small errors) attains that bound. 
In this paper, we derive the closed-form CRB (upto reasonable 
approximations) for estimating [3 when E m corresponds to 
drift corrupted errors and X corresponds to a polynomial 
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signal. In the subsequent discussion, we provide the motivation 
and full descriptions of S TO and X. 

C. Models of Noise and Drift 

At instant n, let x n and e n . m denote the common signal 
magnitude and observation error of sensor m. When the 
effect due to drift is ignored, the measurement noise is often 
modeled as independent and identically distributed Gaussian 
noise u>„, m , (e.g., 0, O), i.e., 

Zn,m = X n + W n>m , W n , m '~' Af(0, of„), (2) 

where <jf n is the noise variance. However, in the presence of 
drift, the observation error has two components: one due to 
baseline-drift d n , m and the other due to random measurement 
noise io„, m , 

^n,m — %n ~l~ d n Jn -\- W n _ m . (3) 

Drift is generally described as a gradual change of the bias of 
a sensor [8|, [12|. Depending on the specific sensing method- 
ology, various models have been proposed to characterize the 
drift sequence {dn m}£=i m frequency and time domains (for 
a survey, see JD). Below we describe three commonly used 
models of drift: 

1) Frequency domain: The phase-drift in an oscillator is 
often modeled in the frequency domain using several powers 
of frequency (power-law model) [12|, i.e., the power spectral 
density (PSD) is of the form, S m (f) = Yj%=-2 hrn,ip, where 
h m> i are appropriate constants. 

2) Temporal domain - Deterministic: Drift is also modeled 
sometimes as a linear movement of the sensor baseline, i.e., 
d n ,m = a m + (n — l)&m> where the intercept a m is set to 
zero after each calibration and slope b m is assumed to be 
an unknown constant that is often estimated later and com- 
pensated for. Example applications include odor identification 
using gas-sensor arrays [ 13] and air pollution monitoring using 
gas sensor networks [14|. 

3) Temporal domain - Random: In the broader signal 
processing literature, auto-regressive moving average (ARMA) 
processes are often used to describe serially correlated time 
series, an example of which is drift. As a trade-off between 
modeling-efficiency and analytical-complexity, drift is often 
modeled as a first-order autoregressive (AR(1)) process. Ex- 
ample applications include Ground Moving Target Indicator 
(GMTI) radars (H, Q3)), Ring-laser Gyroscopes [16), Liquid 
Chromatography 1171 and sensor networks J9j . 

In this paper, we will use the first-order autoregressive 
model to describe the statistical properties of drift. The AR(1) 
model characterizes the drift behavior at m th sensor in terms 
of an auto-correlation parameter p m (visually, a smaller value 
of p m means that the baseline drift crosses the zero-line 
more often and looks more like white noise) and a strength 
parameter erf (signifying the magnitude of drift), 

d n +l,m — Pmdn.m + $n,m) where 

i i W 

Pm G [0,1] and S n , m ~ JV(0 J o£ tB ). 

Note that, when p = 1, the drift is modeled effectively as a 
non-stationary random walk, as in 19). 



If we define 7 m such that erf , m = 7 m cr^ t , our "AR(1)+White 
noise" model for observation error is completely parameterized 
by Wmi Pmi 7m}' These noise and drift parameters usually 
have to be identified from the empirical PSD for stationary 
noise (e.g., ifTTl . [il8|) or other time-domain features for 
non-stationary noise (e.g., |[T9l ). In this paper, we consider 
the characterization of the sensing uncertainty in terms of 
{(7^, p m , 7 m } as part of system identification that must be 
done prior to an observation cycle. Estimation of these pa- 
rameters is beyond the scope of this paper. Moreover, within 
an observation cycle, the drift-parameters are assumed to 
be constant. If the drift parameters change frequently, our 
proposed framework must be used in conjunction with periodic 
system re-identifications. 

D. Deterministic signal model 

Often in the sensor network literature, the signal of interest 
is assumed to be constant over the observation duration, e.g., 
[2|,[3]. This means x n = /3,Vn. However, such a signal model 
may be too simplistic for real applications and we consider a 
generalization of the form, 

p 

x n = ^p p n p , fteR, (5) 

where (3 p -s are the unknown constants that need to be esti- 
mated and P is the order of the polynomial time-series that 
is assumed to be known. In vector notations, the polynomial 
signal x = \x\, x^ . . . , xn]' assumes a linear form x = X(3, 
where X is the Vandermonde matrix 
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It may be noted that time-varying signals in different appli- 
cations are approximated by either polynomials or piecewise- 
polynomials [20|,|2jf|. For example, a polynomial regression- 
based data gathering algorithm for environmental monitoring 
applications was suggested in ll22ll . Also, a polynomial spline 
approximation of stationary random processes was applied to 
Clarke's model of multipath fading channels in |23|. 

This completes the description of the signal and observation 
noise considered in this paper. We intend to derive the closed 
form CRB (for the estimation of /3 p -s) in terms of the signal 
(P,N) and noise {er^,p m ,7 m } parameters. This would help 
us characterize the performance of a sensor network and 
increase our understanding about its estimation capabilities. 

E. Related work 

A related area of work is the study of systematic-bias or 
model-error estimation schemes using multiple, and sometimes 
collaborative sensors. In the radar signal processing literature, 
the process of model-based estimation and subsequent removal 
of systematic errors prior to target tracking is known as sensor- 
registration ll24l . (25). In the weather research literature, the 
serially-correlated forecasting error arising due to modeling 
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deficiency is often considered separately and tracked along- 
side the model parameters |26|, [27|. In the sensor network 
literature, drift-aware networks perform learning-based collab- 
orative bias estimation to enhance the effective lifetime of the 
network (9), ll28l . However, in this paper, we are focused on 
the quality of estimation in the presence of systematic errors, 
rather than techniques on mitigating systematic errors. 

Several researchers have studied the Cramer-Rao bounds for 
polynomial (or polyphase) signal estimation in the presence of 
independent (or correlated) noise. The CRB is usually obtained 
from the inverse of the Fisher Information Matrix (FIM) 1 10 1. 
For Additive White Gaussian Noise (AWGN), the large sample 
approximation of FIM is known to be a multiple of the 
Hilbert Matrix (e.g., ||29ll ). The second order approximation 
was derived in [30 1 in the context of polynomial phase signals. 
For a mixture of additive and multiplicative white Gaussian 
noise, the large sample FIM was shown in [31] to be a 
scalar multiple of the AWGN case. Our primary contribution 
in this paper is the derivation of (approximate) closed-form 
CRB for polynomial signal estimation in the presence of a 
mixture of white (measurement noise) and AR(1) Gaussian 
noise (drift). We also discuss the non-stationary case when the 
autoregressive parameter is equal to 1. To the knowledge of 
the authors, polynomial signal estimation in such a mixture of 
noise has not been considered earlier. As mentioned earlier, 
the study of estimating polynomial signals in AR(1)+White 
noise would help us characterize the capability of a sensor 
network to infer the parameters of a time-varying signal using 
sensors with drift. 

The rest of the paper is organized as follows. In Section 
[IT] we formally list the assumptions and set up the problem 
for two scenarios based on the initial state of the sensor. In 
Section III we consider the single-sensor case and derive 
large sample approximations of the CRB for p < 1 and 
p = 1. In Section IV we extend the results to multiple 
sensors having different drift characteristics. In Section [V] 
we demonstrate the application of the results to a bandwidth 
limited sensor network that communicates only quantized 
observations. Concluding remarks and scope of future research 



are provided in Section VI 



II. Problem Formulation 

To be more specific about our problem framework, we 
formally state the assumptions here. 1) Same phenomenon: 
Each sensor is observing the same physical phenomenon 
(e.g., temperature), which is modeled as a time-polynomial 
signal. In our framework, if multiple signals are to be sensed 
(e.g., temperature and humidity), the observations have to 
be transmitted separately and the parameters independently 
estimated. 2) Known drift statistics: The drift in each sensor is 
modeled as AR(1) Gaussian time-series with known statistics 
- namely the autoregressive and strength parameters, p m and 
7 m respectively. 3) Spatially uncorrelated noise and drift: We 
assume that the observations at all the sensors at a particular 
instant are independent, conditioned on the signal magnitude 
at that instant. In other words, though the observation noise 
samples d n<m + w n . m are temporally correlated (due to drift), 



there is no spatial correlation among them. 4) Synchronized 
observations: The clocks of the sensors are synchronized 
and they have identical sampling intervals. 5) Parallel sensor 
network with fusion center: We assume that the sensors do not 
collaborate among themselves and rather communicate their 
observations only to the sink. 6) Reliable signal transmission: 
In Sections [HI] and |IV| we assume that the noisy observations 
are communicated perfectly to the sink without any further 
distortion. We call this the full-precision case which helps 
obtain a benchmark performance. However, there may be cases 
when, due to power and bandwidth limitations at the sensor 
nodes, reliable communication of full-precision observations 
may be impossible. In such cases, a digital communication 
based framework in conjunction with efficient channel coding 
can be used to reliably transmit only a finite number of bits 0, 
l32l . In Section [V] we will discuss inference using digitized 
observations where all sensor nodes perform quantization 
with identical fidelity With the aforementioned assumptions, 
our goal in this paper is to derive approximate closed form 
expressions for the Cramer-Rao bounds. 



A. Sensor calibration and noise covariance 

Consider a single sensor with noise and drift properties 
denoted by {a 2 , 7, p} (the subscript m is dropped for most of 
this subsection). Assume that inference has to be performed 
from noisy observations z n at instants n = 1,...,N. Let 
r > denote the time-instants elapsed since the sensor was 
last calibrated, i.e., when the drift component was set to zero 
by correcting the baseline. Since the duration of inference 
starts from instant 1, it follows that the drift sequence was 
set to zero at instant 1 — r (which is a non-positive index, a 
slight notational inconvenience). Therefore, the drift sequence 
proceeds as follows, 

di_ T = 0, d 2 - T = Si, d 3 _ T = p&i + 5 2 ... etc.. (7) 

Since each of the drift innovations <5; ~ Af(0,ja 2 ), we have 

VarKO = 7 a 2 (l + p 2 + ■ ■ ■ + p 2 ^"" 1 )) =: 1( r 2 Sl, (8) 

where the summation within the parenthesis is defined as S^. 
Though the drift sequence {d n }n=i lS stationary for large 
magnitudes of r, 



lim ST 



1 



1 



p< 1, 



it is not stationary in the transient stage, for which, 
ST < S 2 r < • • • < S T N , t < 00. 



(9) 



(10) 



In this paper, though we would solve the estimation problem 
for a general r, we would refer to the limiting cases r = 1 as 
calibrated (C) and r — > 00 as uncalibrated (U) respectively. 

' For sensor networks where there is a constraint on system- wide bandwidth 
(summed across all nodes), different sensors nodes may be assigned different 
fidelity of quantization based on their observation quality, e.g., 1331 . Though 
outside the scope of this paper, optimal fidelity assignment for sensors in the 
presence of drift is a challenging topic worthy of future research. 



IEEE TRANSACTIONS ON SIGNAL PROCESSING (TO APPEAR IN OCT. 2012 ISSUE) 



4 



We define R such that ^a 2 R is the covariance matrix of 



the drift vector d = [d\, 
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(11) 



For the uncalibrated case, the diagonal elements are the same 
(Sn — 1 _ 1 p 2 X and R takes the shape of the well known 
stationary matrix (e.g., OH), 



E[d'd] 



7<r 



1 

P 

N-l 



P 
1 



P 



P 



N-2 



N-V 
Jf-2 



1 



(12) 



sometimes referred to as the Kac-Murdock-Szego matrix. 

For the m th sensor, we denote the covariance of the drift 
sequence by R m . Let S m denote the covariance matrix of the 
total error (AR( 1 )+White noise), i.e., e m = w m + d m , so that, 

S m = o 2 m (I + j m Rm) , (13) 

where J is the N x N identity matrix. 

B. Maximum-Likelihood estimation and Cramer-Rao bound 

Given the linear observation model (JT| and noise covariance 
described in Section |II-A| the Maximum-Likelihood (ML) 
estimator of (3 at the sink is of the form (for a reference on 
estimation theory, see ifTUl . |TT)), 

M 

^ = J- l X> J2 S- X z m , d4) 

m— 1 

where J is known as the Fisher Information matrix, 



M 



J±X> PTE" 1 



(15) 



\m— 1 



It is well known in estimation theory [ 10 1 that within the 
class of unbiased estimators, the ML estimator of a linear 
model is optimal in terms of estimation variance [ 10 1. Also, the 
least possible estimation variance is provided by the Cramer- 
Rao lower bound, V = J~ , so that 

E 



> V p , p = [j- 



>p,p ■ 



(16) 



for < p < P. Since Equation ( fTo} holds with a strict equality 
for linear models, the CRB is an appropriate performance 
metric for our problem. 

It is unclear from ( fT5] > and ( fTS] ) exactly how the estimation 
variance V p , p depends on the drift parameters ff^,7 m ,p m , 
signal parameters P and the sample size N. Our goal in 
this paper is to derive large sample approximations for these 
Cramer-Rao bounds and thereby provide insight into the 
behavior of estimation performance as it relates to the drift 
properties of the sensors. 



III. Main Result: Single Sensor Case 

Towards obtaining the CRB for the general case of multi- 
ple sensors, we first consider the single-sensor scenario and 
thereby obtain the core results of this paper. Subsequent 
applications of these core results to the multi-sensor scenario 
(Section |IV| i and bandwidth limited networks (Section fV) , as 
we shall see later, will be somewhat straightforward exten- 
sions. 

While analyzing the single-sensor scenario, for notational 
brevity, we drop the sensor-index subscript m from a 2 n , p m , 
7 m , R m and S m for the remainder of this section. Note from 
( p3| ) and ( [To} that, for a single sensor, the Fisher Information 
matrix is J = X'S _1 X and the Cramer-Rao bound is 
[^]p.p = [J~ 1 ]p,p- I n me following discussion, we first 
compute the inverses of the disturbance covariance matrices. 
Next, we use those results to derive the CRB. 



A. Inverse of disturbance covariance 

Our goal is to approximate S _1 in a form that are ana- 
lytically tractable. The exact closed form expression for S^ 1 
(for r — > oo) is known to be quite complicated (e.g., p-53 of 
||35ll , [36]) and we will not use it. The authors have not found 
a closed form expression for either S^ 1 (the t — 1 case) or 
XP 1 (for general r) in the literature. 

In this paper, we propose an approximation to S _1 that is 
novel to the best of knowledge of the authors. We suggest the 
following form for the inverse 



(7 + 7-Rr 1 = I -vM + 0{y 



(17) 



where v is a constant, M is a matrix with certain structure 
(to be described later), and y is a quantity less than 1, so 
that 0(y N ) represents a term that vanishes exponentially with 
sample size N. The idea is that, for large N, we should be 
able to use the approximation 



(I + jR) 



I vM 



(18) 



towards computing the CRB. The specifics of this approxima- 
tion are laid out in Proposition [JJ 

Proposition 1: Based on drift parameters 7 and p, define 
the following constants, 



7 + 1 



p(l-y 2 ) 



, and k = 




P) -4 



(19) 



Then (J + jR) 1 = I vM + G(y N ), where M has the 
structure shown in Equation pTj ), with 

a^l + y 2 ^)^, b^l + y^K, 

1-y 2 l Qt 4 p 2t 

l-py + p T y/p ' r 1 + p 2 + ■ ■ ■ + p 2T ~ 2 

(20) 

Proof: See Appendix |A] ■ 
The constant y can perhaps be more conveniently described 
as the smaller of the roots of the quadratic equation (both 
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h . 



1 + y 



20-1). 



(21) 



of which are positive) y 2 



+ pj + 1 = 0. It is easy to 

establish that y < 1 for p < 1 and for all 7, so that y N — > for 
large sample size N. Also note that for the two special cases of 
r = 1 and r = 00, gc — p 2 , f]c — ~y 2 and Qu = 0, r\u = n. 

Remark^!: Identity: The following identity will be used 
in several places in this paper and follows from definition ( fT9( ) 
after some algebraic manipulations, 



72/ = 



(22) 



Remark^Z: It may be noted that M is defined entirely by 
the diagonal elements ai,a 2 , . . . , a^jvj (counted from top) and 
61, 62, • • • j &r jv -1 (counted from bottom). From the expressions 
for ai and bj in Equation ( f2"T) , we note that for large N, 
both a^jvj and 6|-jv^ converge to the same value, namely 1. 
That means that it should not matter whether the anti-diagonal 
elements are expressed in terms of et^s or bj-s. In Proposition 
[T| we describe the antidiagonal elements in terms of bj-s just 
for the sake of being definitive. 

B. Computation of Fisher Information Matrix 

We use Proposition[T]in (\3\ and ([15) to compute the Fisher 
Information matrix, 



J w -^X'(I -vM)X, 
a 1 



(23) 



which would be inverted later to obtain the CRB-s. We 
consider the matrices X'X and X' MX individually before 
summing them up. We would approximate all the elements of 
these matrices as polynomials in N (i.e., the sample size). This 
will help derive approximations of the CRB that are correct 
upto the second order for p < 1. 

We note from (|6]l that X 1 X is a Hankel (equal skew- 
diagonal elements) matrix, 



A' 



n=l 



(24) 



It is a well known result, e.g., 1301 , QTI that summations of 
the form d24ll can be written as 



(25) 



N q 

^ n q = V B qA N q+1 -\ q > 0, where 



n=l 



12 



0, etc. 



A general form for B q ^ can be found, for example, on p-1, 
l37l . However, all the results in this paper will be established 
using the first four terms which we have enumerated in p5) . 
Similar polynomial expressions can be obtained for X' MX. 



Proposition 2: For < k,l < P, we have 



i=0 



k+l+l-i 



x k,l 



0(N 



k+L+1 y N ) 



(26) 



where some leading constants Ak.i.i and Oiu , are 



A 



k.LO 



Ak.1.1 = 



l + y 



1-yk + l + l' 
1 + 2k - 2y - y 2 



S T ) A 



2(1 - y) 2 
-2y + i] T + k 



k + l>l, 



(27) 



etc. 



(i-y) 2 

Proof: See Appendix |B] 
The exact form of Ak.i,i and aZ, are provided in Appendix 
[B] From p6l, it is clear that the Fisher information for 
calibrated and uncalibrated cases differ only by a constant 
(dependence on r is explicitly indicated). This means that 
when sample size is large, estimation accuracy will be similar 
for both the cases. We will elaborate this point later. Next, 
we use the result in Proposition [2] to derive the CRBs for the 
cases when p < 1 and p = 1. 

C. Cramer-Rao Bounds for p < 1 

Using (p4|,(p5]l and |26) in ((23), we obtain the compact 
second order expression for the Fisher Information Matrix, 

N 2 



NE 



N 



O 



E, (28) 



where E, H are (P+ 1) x (P + l) matrices, e, / are (P+l)- 
dimensional vectors and S.oAii^,^ are constants defined by 



£^diag{l,...,iV p }, H Kl ± 



1 



e^fl, 



Co 



,1]', /=[i,o 
l + y 



k + l + i' 
o]', 



1 - v 



At) a 
?2 — 



1 



1-2/ 

^0,0' 



6 = 



1 



- vA\ 



(29) 
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where < k, I < P. The notation Ai is actually Ak.1,1 with 
the subscripts dropped, since Akj.i does not depend on k and 
I (see (|2"7)). H is the well known Hilbert matrix [30]. 

The leading constant £ can be simplified further based on 
definitions in ([19} and the identity ( |2"2") , 



ft 



l 



i 



7 



(30) 



(l-p)\ 

The CRB will be obtained as the inverse of the FIM described 
in ( p8| . Since p < 1, we ensure from ([30} that £o > 0- There- 
fore, the FIM of the form ( |28) can be inverted assuming £o-H~ 
as the dominant term. Such an inversion was performed in ll30ll 
in the context of polynomial-phase signals. We summarize the 
analysis in ll30l in the form of Lemma [T] 

Lemma 1: [30|: Let H be the Hilbert matrix and e, f 
vectors as defined in ([29}. Let cq, c\,C2 be constants such that 
cq ^ 0. Then, for < p < P, the diagonal elements are 



c H 



K 



p,p 



co 



Kp^ p — 



1 ( > 



1 



c 2 ff) 



(31) 



1 



(ci +c 2 L P , p )+0 



2p+l Nc 
where Kp p and Lp p are defined by 

'P + p\ fP 



1 

]V2 



(P+P+l) 



Lp,p — 



P + l 
P + l 



(32) 

The idea behind Lemma [T] is that, for sufficiently large 
N, terms of order O(^) in ([31} can be ignored and we 
obtain approximate closed form expressions for the diagonal 
elements. The approximation in pT) is accurate only for 
small relative magnitude of the second order term, i.e., small 
magnitudes of ( 2p+1 )^i+ C2Lp -p) _ Lemma jljcan be used now to 
invert ([28}. The Cramer-Rao bound (diagonal terms) obtained 
in such a manner are summarized in Proposition [3] 

Proposition 3: For p < 1 and sufficiently large N, the CRB 
for estimating (3 p is 



[V] 



K 



p.p 



1 



2p + 1 JVC, 



P,p 



(33) 



for < p < P. 

A few remarks due to Proposition [3] are in order. 

Remark [3] 1: Asymptotic performance: As expected, the 
estimation performance does not depend on transient infor- 
mation (r) when the number of samples used for inference 
is sufficiently large. From ( |30} and ( |33} , the first order 
approximation of CRB is 

a 2 K f 



1 



7 



^p,p 



N 2 P+ 1 (2p+l) 



(34) 



(l-p)\ 

which is true for any value of r. 

Remark [i]2: Equivalent AWGN noise: For estimation of 
polynomial signals, the effect of drift is asymptotically (upto 
first order) equivalent to scaling the measurement noise by a 
factor of 1 + 7/(1 — p) 2 , where 7, p characterizes the drift 
properties of a sensor. 



Remark ^\3: Constant signal: x n — /3q , Vn: Simpler and 
more precise expressions for CRB may be obtained for a 
constant signal. When P = 0, we can directly proceed from 
d28}. For this case, we have ee' = //' = E = H = 1, hence 



V 



1 



(35) 



where the scalar V denotes the Cramer-Rao bound. 

Remark [3]4: Dependence on t: In terms of estimation 
of a signal, intuitively, the transient state r < 00 should be 
more informative, since the drift starts from a known point, 
rather than an unknown point. This intuition is corroborated 
by equation ( [33) , from which it is easy to derive that [V] p . p 
increases with r (shown below). In conjunction with Re- 
mark [3] 1, we can conclude that, though calibration always 
results in better estimates, the relative gain in performance 
([^V]p,j> — [^"c]p,p)/[V'tr]p,p) diminishes with sample size. 

Proof that [V] P)P increases with r: Since £0 > (from 
pO}), it suffices to show that £2 decreases with r (see ([33}), 
or equivalently, ol^\ increases with r (see ( |29} , note that v > 
0), which is further equivalent to showing that rj T increases 
with r (see (|27}). The last condition is verified from ( |20} , 
since p T is a decreasing function of r and < p, y < 1. 

7?eOTar^[i]5: Approximation region: The CRB as expressed 
by ( p3") is accurate only for cases when the relative magnitudes 
of the second order terms, 



(2p+i)(ei + ^ r) L P , p 



0<p<P, (36) 



are small. This condition helps specify the operating values of 
a 2 , 7, p, P and N for which the closed-form CRB ([33} can be 
used for performance analysis. 

D. Cramer-Rao Bounds for p = 1 

For the case when the AR parameter p = 1, the drift 
phenomenon is a non-stationary even in the limit of t — > 00, 
since it is a random walk (e.g., [9]) where the bias builds 
up with time and is unbounded. Hence, the CRB for the 
uncalibrated scenario is infinity and we consider only the 
transient case r < 00 here. 

For p = 1, the leading term of the Fisher information matrix 
J |28} vanishes since ( [30} implies that £0 = 0. Hence, terms 
of order 0(N~ 2 ) need to be considered. In particular, using 
definitions ([19}, ([27} and ( |29} , the constants can further be 
simplified as 



Co = 0, a = 0, q. 



r = -, 7- VT" 



47- 



1 



7 + 1 



2t 

7^1 



(37) 



and we subsequently note (see Appendix |C} that the FIM is 
of the structure depicted in equation ([38), 



J = E 



At) 
S2 



f'p + ^im 
±D[iUH P + ±F + 0[ 

(r). 



JY 



(38) 



with F = ^epe'p + ^ T> fpf' P , 
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with the matrix D and constants £ 3 

,(r) A 1 



?4,C5,^ T) defined as 



D4dia g {i,...,p}, er = 



i 

9 ' 
<J Z J 



1 



a 2 (1-y) 



1 (t) 



(39) 



and ep,f P and Jfp being the P-dimensional equivalents 
of e, / and H (defined in ((29])) respectively. The CRB is 
obtained from the FIM ( [38] ) by using block-inversion and 
Lemma [T] (see Appendix [C} , 

Proposition 4: For p = 1 and sufficiently large AT, the CRB 
for estimating /3 p , [V] P)P is equal to 




(40) 



A few remarks due to Proposition [4] are noted below. 

Remark^!: Constant Signal: x n = /3o,Vn: For a random 
walk drift, a constant signal can only be estimated inconsis- 
tently with asymptotic variance 



V 



2a 2 
7+1 



2r 
7^1 



(41) 



It is explicitly seen from ( HTI that the variance increases 
with white noise variance a 2 , the drift strength 7 (note 
that 7 decreases with 7, see ((37|), and the time since last 
calibration, r. This phenomenon of inconsistent estimation 
can be intuitively understood as follows. Even without the 
white noise component in (j3j, the observation never captures 
independent readings of /3q- Rather, the samples are 



zi = Po + d T+1 + w 2 = (Po + d T ) + S T+1 + w 2 , etc., 



(42) 



which means that p\ + d T appears together in all subsequent 
observations, thereby making p\ indistinguishable from d T . 
Since d T has a finite variance, /3 can only be estimated 
inconsistently. 

Remark |5]2; Time Features: For other parameters whose 
effect on the signal vary with time, i.e., /3 p for p > 1, the 
CRB is (to first order) equivalent to the case of estimating 
the derivative of the signal in drift-only noise. We note that 
the derivative of the signal is x'(t) = 2^2p=iPPp^ pl and the 
forward difference (see ([3])), 

z n +i - z„ = X n+ i - x n + d n+ i - d„ + W n+ i - w n 

= x'(n)(AT) + S n + w n+ i - w n , (43) 

which is an estimator of the derivative, contains independent 
drift innovations S n with variance -fa 2 . Hence in Proposition 
01 the variance of estimating (3 p is scaled down by a factor of 
p 2 compared to the equivalent case with white noise 7c 2 . 



Remark^3: Approximation region: The CRB as expressed 
by (j40j is accurate only for cases when the relative magnitude 
of the second order terms, 




P = 0, 



, 1 < P < P- 



(44) 



are small. This condition helps specify the operating values of 
er 2 ,7, P and N for which the closed-form CRB ( |40"| ) can be 
used for performance analysis. 

E. Numerical Results - Maximum Relative Error 

In the remainder of this section, we demonstrate the ac- 
curacy of the closed-form performance approximations using 
representative examples. We shall consider the approximations 
expressed in both the forms of Fisher Infomation matrix 
(Equations ( |28| ) and ( |3~8| l, with O (w) terms truncated) and 
Cramer-Rao bounds (Equations ([33| and (|40]>). 

We use the metric called Maximum Relative Error (MRE) 
which was also used in [30]. This involves computing both the 
exact ( fT6] i and approximate CRB-s. The relative deviations of 
the approximations are then calculated for all < p < P and 
the largest one is called the MRE. We denote 1) [V™] P|P 
as the theoretical CRB as in ( fT6| l and 2) [^ AP ] P , P as the 
approximate CRB derived either through 2a) FIM approx: 
inversion of the intermediate FIM, i.e., [V AP ] PiP = [J _1 ] PlP , 
in Equations ( |28] i and ([38| with O (7^2) terms truncated, or 
2b) CRB approx: final Cramer-Rao bounds in Equations ( [33] ) 
and d40|. Then the maximum relative error is defined by 



MRE 



w 



THi 



max 

p=0,l, - ,p 



[V 



APi 



(45) 



In short, the MRE summarizes the approximation error over 
all components of the parameter vector. 

As an example, we consider the estimation of a cubic- 
polynomial signal, i.e., P = 2. Since our performance bounds 
are asymptotically accurate, it is expected that the MRE 
will decrease with increasing sample size N. In Figure [T] 
we display the sample size N e required for 95% accuracy, 
or in other words, MRE < e = 0.05. Since MRE is a 
ratio of variances, it does not depend on the measurement 
noise variance a 2 . While considering the p < 1 case, we 
have displayed the wide parameter region 7 £ [10 -3 , 1], 
p G [0.6,0.97] in Figures 1(a) and 1(b) - which demonstrate 
the uncalibrated (r — » 00) and calibrated (r = 1) scenarios 
respectively. The p = 1, r = 1 case is demonstrated in |l(c)| 
where we have displayed the parameter region 7 e [10~ 3 , 1]. 

Figure [T] provides useful guidance on the applicability and 
limitations of the performance bounds derived in this paper. 
Firstly, the performance bounds are found to be reasonably 
accurate over a wide range of possible parameter values and 
for a moderate number (tens and hundreds) of samples. For 
accurate performance prediction in stationary drift (Figures 
|l(a)| and |l(b)| >, higher values of drift-autocorrelation (p) and 
drift-strength (7) generally requires larger observation du- 
rations (N e ). This is predicted by the approximation-region 
condition in ([36]), since the denominator £0 (see d30|) is 
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(b) Stationary drift (p < 1) and uncalibrated sensors. 



CRB Approx 
FIM Approx 




ie-2 1 e-1 

Drift strength, 7 

(c) Non-stationary drift (p = 1). 
Fig. 1. Sample size N t required for 95% accurate performance prediction 

inversely proportional to p and 7. For the random-walk drift 



scenario (Figure 1(c) 1, higher value of drift-strength 7 requires 
smaller observation durations (N e ) for accurate performance 
predictions. This too, can be explained from approximation- 
region condition in ( |44) , where, with the aid of definitions ( |3~7| ) 
and ( [39) , it can be established that e p = C(7 -1 / 2 ) for small 
7. This also partially explains the log-linear relation between 
N e and 7 in Figure 1(c) 



We have so far only considered the estimation performance 
of a single sensor that is able to reliably communicate its 
observations to the sink without any distortion. In Sections IV 
and[V] we discuss extensions to the multiple sensor framework 
and bandwidth limited networks. 



IV. Multiple Sensors 

In this section, we consider the application of the results in 
Section [Til] to multiple sensors with different noise and drift 
parameters {of„, 7 m , p m }- Since the sensor noise and drift 
are independent across the sensors (Assumption 3), the Fisher 
Information for f3 is equal to the sum of individual FIM-s (see 
equation (fT5))), J = J\+J2+- ■ -+Jm- Hence the expressions 
for FIM (given by ( |28j ) and ( j38] l) and subsequently CRB (given 
by ( |33| ) and ( |4"0| )) can be extended with the following change 



M 

£ 

m— 1 



£#(cr™,7m,Pm), for [#] = 0, 1, . . . , 6, (46) 



where £#(<7 2 ,7,p) can be thought of as a function of its 
arguments as defined in ( p9| ) and ( |39| l and £#.eff denote the 
effective value of the constant. 

As an illustration of this extension, we consider an example 
where noise and drift parameters are uniformly (randomly) 
distributed over a given range, say, 

p m € [phPu], <? m S Wi^l] and 7m G [7H,7u]- 



(47) 

When the number of sensors M is large, ( |46| i can be further 
approximated by substituting the summations by integrals, 



£#,e ff = ME[£#] 
M 



£#dpdjda 2 . (48) 



'ii •> pi 

We would refer to the CRB derived using the constants £#,eff 
(arising out of integration) as the Average-CRB . We expect 
the Average-CRB to be an effective indicator of system per- 
formance for large number of sensors M, as we demonstrate 
with some numerical results below. 

A. Numerical Results 

We demonstrate the results for both p < 1 and p = 1 cases. 
The simulation setup is described next. 1) We consider a 
linear signal, i.e., P = 1, The parameters to be estimated are 
the constant term (3 and the slope term fi\.2) We consider 
multi-sensor systems with the number of sensors, M, starting 
from 25 and going upto 100. 3) For the p < 1 scenario, 
for each sensor, the drift parameters are randomly selected by 
choosing the parameters from the range, 

p m e [.85, .95], a 2 „ £ [72, 288] and 7m e [.6, 2.4], (49) 

which is close to an estimated spectrum in ifTTl . Within the 
p < 1 scenario, we simulate both calibrated (r m = 1, Vm) 
and uncalibrated (r m = oo, Vm) cases. In one simulation, 
we assume that all M sensors are calibrated while on another 
simulation we assume that none of them are calibrated. 4) For 
the p = 1 scenario, the drift parameters are randomly selected 
by choosing the parameters from the range, 



<j 2 m e [2,12] and 7m e [.05, .3]. 



(50) 



5) For each realization of an M-sensor network, ML- 
estimation of the constant and slope parameters were per- 
formed and the error variances were averaged over 10 5 Monte- 
Carlo trials (realizing the measurement noise and sensor drift). 

6) Several (10 3 ) Monte-carlo trials (realizing A/-sensor net- 
works with (7 2 ,p m ,7 m chosen from above parameter range) 
are performed and the average and 95% confidence interval of 
the error variances are observed. 7) We repeat the experiments 
for sample sizes N — 80 and 160. These samples sizes 
were chosen to ensure moderate computational (Monte-Carlo) 
effort. Since we consider a small number of samples, we have 
used the expressions for Fisher Information Matrix (Equations 
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Fig. 2. Average performance for multi-sensor systems for p S [0.85,0.95]. 



Fig. 4. Bandwidth constrained sensor network where inference is performed 
using quantized data. 
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Fig. 3. Average performance for multi-sensor systems for p = 1. 



( |28[ > and ( |38] l) to predict the estimation error variance of the 
constant (3o and slope /?i portion of a signal. The results 
are displayed in Figures [2] and [5] of which we make some 
comments below. 

Firstly, an approximation to the system performance using 
( |48| i is seen to be fairly accurate, as depicted in Figures [2] 
and [3] In all cases, the variance is inversely proportional to 
the number of sensors, as depicted by the log-linearity of all 
the curves. The (average) performance prediction improves 
(error-bars becomes shorter) for higher sample sizes, since the 
summation |46) is better approximated by the integral ( |48j ) for 
large M. This means that just by knowing the range of the 
parameter values and the number of sensors, we can have a 
good understanding about the estimation performance of the 
entire sensor network. Secondly, from Figure [2j we note the 
intuitive phenomenon that the performance using calibrated 
sensors is better than that using uncalibrated sensors. Also, 



though for N = 80, the performance gap is quite large, 
the gap between calibrated and uncalibrated cases narrows 
down for N = 160. This corroborates Remark [3]4, where we 
noted that the relative performance gain diminishes for higher 
sample sizes. For the p = 1 case, from Figure [3] we note that 
increasing the sample size does not help in estimation of the 
constant portion of the signal. This is due the inconsistency 
property of the non-stationary drift model, as described in 
Remark @]1. 

V. Quantized observations 



III 



As another application of the results in Section 
consider the performance characterization of a sensor network 
where resource (bandwidth, power, etc.) available for commu- 
nication is limited [2|, |32|. Bandwidth constraints preclude 
the reliable communication of full-precision observations from 
the individual sensor nodes to the sink. In such situations, 
the observations must be compressed and digitized prior to 
transmission l38l . In this section, we digitize each observation 
by using uniform quantization, as will be discussed shortly. 
Since the reliability of the transmission of digitized obser- 
vations may be further enhanced through the use of efficient 
channel coding (error-control codes), it is reasonable to assume 
that once the quantization is performed, there is no further 
deterioration of the observation-quality J2)- In other words, 
the quantized observations are assumed to be available error- 
free at the sink. The schematic diagram of such a system is 
depicted in Figure |4] Our goal in this section is to predict 
the estimation performance of the sensor network composed 
of sensors with drift, when only quantized observations are 
available for inference at the sink. 

We use uniform quantization [33 1, in which each of the ob- 
servations z n m are quantized uniformly with Z-bits. Assuming 
that the observations are bounded z nm £ [Uq,Ui] and the 



quantization thresholds aj g [Uq,Ui], j 
formly spaced such that a J+ i — a 3 = (U\ 



- ( y )/(2 i -i)^A, 
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&q — m z n nl ) ] — 



(51) 



the expected distortion due to the quantization process is 

12 

Since the quantization noise ~z n , m — z n ^ m , in general, is 
neither independent across time nor Gaussian distributed, the 
Maximum-Likelihood estimator is difficult to design. How- 
ever, the quasi-ML estimator is easier to implement, which 
is designed on the assumption that the quantization noise 
is i.i.d. Gaussian with variance Oq. We will use the quasi- 
ML estimator, described below, to perform inference using 
quantized observations. 

Effectively, the quasi-ML estimator assumes that the noise 
at individual sensor nodes has an added component <JqI, with 
the total covariance being (compare with ( fT3] >) 

= OqI + of n (I + 7 m R*, m ) 

^ n (I + 7 m R^ m ), (52) 



where a m ,7m satisfy a z m = af n + u A Q and j m = 1 
Accordingly, the quasi-ML estimator is (compare to 



where 




(53) 



Note that, in the limiting case when quantization errors are 
small (equivalently, large I), we have er^ — > <x^,7 m —> 7 m 
and the quasi-ML-estimator is identical to the ML-estimator. 

We would refer to the CRB derived using modified noise 
parameters {ct^, j m , p m } as the Modified-CRB . We expect the 
Modified-CRB to be an effective indicator of system perfor- 
mance for moderate to large number of quantization levels 2 l . 



Similar to Section IV with certain change in definitions, the 
expressions for FIM (given by ( |28] i and ( |3~8| l) and subsequently 
CRB (given by ( |33| ) and ( |40| i) can be extended to obtain 
the Modified-CRB, £#,eff 4 £™ =1 £ # (p 

mi ff ii7m); where 

£#(p, 0-2 j 7) can be thought of as a function of its arguments 
as defined in Q and (f39) for [#] = 0, 1, . . . , 6. We show 
some numerical results below to corroborate the effectiveness 
of Modified-CRB as an efficient performance predictor. 

A. Numerical Results 

The simulation setup is described as follows. 1) A sample 
size of N = 400 and network size of M = 25 nodes was 
considered. 2) The noise and drift parameters of the sensor 
nodes are chosen by uniformly spacing them in the range, 

72 = a\ < erf < • • • < <t 2 m = 288, 
0.6 = 71 < 72 < • • • < 7m = 2.4, and 
0.85 = pi < p 2 < ■ ■ ■ < pm = 0.95. 

Both calibrated (r m = 1, Vm) and uncalibrated (r m = 
00, Vm) cases were considered. 3) We assume a linear 
signal, i.e., P = 1 with the constant term /3q — 400 and 
the slope term /?i = 0.9. The range of the observations 
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Fig. 5. Cramer-Rao bounds with quantized observations. 



to be quantized was chosen to be Uq = 0,U\ = 1200, 
beyond which the observations were clipped. The range was 
deliberately chosen large so that clipping (which is another 
source of distortion which we have not modeled) does not 
occur frequently. 4) Uniform quantization is performed using 
I = 5, 6, . . . , 9 bits per observation. 5) Quasi-ML estimation 
of the constant and slope parameters was performed and the 
error variances were averaged over 10 8 Monte-Carlo trials 
(realizing the measurement noise and sensor drift). The 95% 
confidence interval of the error variances were also observed. 

The results of Monte-Carlo simulations are compared to 
theoretical predictions from Modified-CRB and displayed in 
Figure [5] The full -precision CRB is also displayed in the figure 
(labeled w/o Quant), marking the convergence of Modified- 
CRB in the large-Z regime. The actual estimation variance of 
both the constant and slope parameters (of the linear signal) 
seem to agree, with reasonable accuracy, to the Modified-CRB. 

VI. Conclusion 

In this paper, we have derived approximate bounds for the 
estimation accuracy of polynomial signals using sensors that 
exhibit drift in addition to having measurement errors. This 
is important since sensor drift, or loss of calibration with 
time, is a major problem in many applications. The theoret- 
ical closed form expressions are validated through numerical 
results. As future work, we intend to derive performance 
measures for other signal models, e.g., stochastic models 
rather than deterministic. Additionally, it may be interesting 
to investigate the impact of spatial correlation (i.e, relaxing 
Assumption 3) on estimation performance. More bit-efficient 
quantization schemes (other than uniform quantization) that 
optimally allocate bandwidth to each of the sensors is also 
another topic of interest. Finally, in-network inference where 
only the summary of estimated parameters are communicated 
to the sink, rather than entire observations, will be another 
framework that one might consider. 
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Appendix A 
Proof of Proposition 1 

In Proposition [T] we need to show equation ((17), which is 
(I + -fRy 1 = I - vM + 0{y N ). Define M be such that 
(I + "/Ry 1 = I—vM, i.e., the exact form of ([17]). It suffices 
to prove that M = M + 0(y N ), or equivalently 

I + 0(y N ). 

— -l ~ 
To prove ( f54| i, we need M . Definition of M implies 



M 1 M 



(54) 



M =v{I+R~ L /i). (55) 

The inverse of Ru (given by ( (12} ) is well known [34|. The 
inverse of R is similar to that of Ru except for the top-left 
element. Specifically, 



R 



1 + Qr 



1+P 2 



(56) 



From (f55} and with the help of some identities that follow 
from ( |19| l, namely v(l + 1/7) = (1 — py + y 2 )/(l — y 2 ) and 
+ (1 + p 2 )h) = (1 + y 2 )/(l - y 2 ), we have, 



M 



-y 



1 + y 2 



-y 
-y 






-y 

1 - py + y\ 



where m T = 1 — py + y 2 + g T y/p- Constructing M from 
Equation ( (21} , we can now directly verify ( (54"} . Specifically, 
the residuaTis of the following structure 



M 1 M- 



y 

ro 



ri r 2 
r 2 



T3 





(57) 



where r = (1- py)(p- p 2 y + g T y), n = -Q T y, r 2 = -p(p- 
y)( l ~ PV) + £v(l + y 2 - py) and r 3 = r 2 + g T y{Q T - P )/p- 
Since ( (57} are terms of order 0(y N ), this completes the proof 
of |54} and hence Proposition [T] 



Appendix B 
Polynomial Approximation of X 1 MX 

First, we would decompose M (see ( pi) ) to aid analytical 
calculations. Note that 



M = I + Mi + r] T M 2 + kM 3 , 
and matrices M l7 M 2 and Ms are defined by 

y 
y 



(58) 



Mi 



M 2 4 



y N-2 y N- 3 
ly N-l y N-2 



1 y 

y 

,N-2 



y N-2 y N-X 
y N-S y N-2 



y N ~ 2 









(59) 



and (60) 



M 3 






y"- 1 

f- 1 y N ~ 2 



/ 



y N-l y N-2 



(61) 



respectively. To calculate X MX, we compute each of terms 
in X'X is given by ((24} and ((25}. For X'M X X, we 
refer to d59} and collect the identical powers of y from either 
side of the principal diagonal, 

JV-l N-i 

[X'M.XU = ^j/ I ^(j fc ( l +j) i +/(i + j) fc ) 
i=i j=i 

JV-l N-i k + l 

= E^EE^ &+! ~ r 



■ k + l-r 



i=l 2=1 r=0 
JV-l k+l N-i 

= E^E^* r E^ 

i=l r=0 j = l 

JV-l k + l k + l-r 

-E^E^ r E B s>r (N-i) k+l - r+1 - s 

i=l r=0 s=Q 
N — l q — 1 q — r—1 q — r — s 

= E^E^ r E B ^ E C*,:rN*- r —* 

i=\ r =0 s=0 £=0 

N—l q—1 q—r—1 q—r—s 

= E^E E E 

i=l r=0 s=0 t = 

JV — 1 /(/—I It D q u — 1 

= ^ ] j/ I ^ ] ^""^ ^ ] Xr lU — <u t ii—r + ^""^ ^ t Xr,q—v,v- 
i=l \u=0 «=0 r=0 

JV-l /3— 1 1 



(<0 



i; — 1 r — 

q v — 1 



E »' E ^ E i " E + E iV E 



(9) 



i=l \u=0 

q—1 u 



D=0 r=0 



v—1 



N q ~ u E^E + E y " E ^.-.9 +0(^2/^) 



v=0 r=0 



v = l r=0 



(62) 
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where in step (a) we collect all like powers of i and j and 
define A r = ( ) + f ), (b) follows from the summation 
formula in |25| and the definition B s r = Bk+i~ r ,s, (c) is 
due to binomial expansion and definitions q = k + I + 1 
and Ct, s ,r = ( 9 t Step (d) follows from the definition 
X r>Stt 'i i r+t N q - r - a - t A r B a>r C t>a>r (-l) t , step (e) is an 
identity involving series rearrangement (i.e., true for any 
X r ,s,t, by defining u = r + s+t and v = r+t), step (f) follows 
from defining D rvu = A r B u - Vyr C v - r , u -v,r(—l) v ~ r - Step 
(g) follows from the definition Y v = ^"y 1 (which 

converges for finite v since y < 1 and the error term is of 
the order N v y N ). Quantities of the form Y v are also known 
as poly-logarithms. In the last step, we denote the polynomial 
coefficients by A u ,m x and the constant term by olm x . 

For X M%X, we start with d60l > and collect the identical 
powers of y from the top-left han% 

JV-l i 

[X'M 2 X] ktl = 1 £y i - 11 £j k (i + l-j) 1 

j=l 3 =1 



i,, n \ 



(63) 



3=1 



*Af 2 



where ajf 2 is a constant. Similarly, for X'M 3 X, we collect 
the identical powers of y from the bottom-right half of ( |6"Tj ), 

JV-l i 

[x'M 3 x] ktl = £ v" 1 E^ - i + V k ( N + i - 

i=l j=l 

JV-l i k I 

£ y'' 1 E E GrN k - r (-j + iy £ h s n i - s (j - o s 

i — 1 3—1 ?"=0 s— 

JV-l i fc+Z t 

E^EE^-'E^ 

i=l j=l t=0 s=0 



(64) 



where (a) follows from binomial expansion and definitions 
G r = ( ), H s = r). In (b), we have rearranged the sum to 
group similar exponents of N with the transformation t = r+s 
and defined K t , s = G t - s (-j + l)*- s ff s (j - i) s , (c) follows 
from similar arguments while deriving ( j62| and ( |63") l and the 
definition A t ,M 3 - E^i J2)=i E s =o t < k + I. 

We note that all the constants A^Mi > &Mi > a M2 > -4t,M 3 
depend on fc and i. We can now prove Proposition IT 
composing I'MX with the help of |62]l, §6%} and ( [64 
( |58| >. Define _4_ 1A / 3 = 0. The constants Ak,^. 
Proposition [2] are given by 

A k ,i,i = Bk+i,i + A%,Mi + «-^-i,M S ) < i < fc + /, 

respectively. Some of these constants are enumerated in Table 
[I] This completes the proof. For example, to compute A k ,i,o 
note that B k +i,o = fc+T+j, -4o.il/! = iT[+i> Y o = I=^> hence 

1 + y 1 



and a k T [ in 



(65) 



,4 



fc,/,0 



B 



k+l,0 



YnA, 



0-^0. Mi — 



1-yk + l + l' 



(66) 



which agrees with ( |2"7j ). 



Appendix C 

FIM AND CRB FOR (0 = 1 

First we derive the FIM given by Equation ( [38] i. Let the 
Taylor Series of the FIM be of the form 



J = 



1 



Qc 



l 

iV 



Qi 



l 

iV2 



1 



E. (67) 



The first term Q Q is equal to £2// (follows from ( |28| and 
the fact that £ = £j = for p = 1). The second order term 
follows from collecting third-order terms in ( p4| ) and ([26]) 
and adding them according to ( |23] l, i.e., 

fc = / = 0, 



[Qijfc,/ = 



B 



k+l,2 



vA 



0, 

(r) 
Y k,l ' 



fc + ; = 1, 

k + I > 2. 



(68) 



From definitions ( fT9| ) and identity |22|, we obtain the fol- 
lowing simplifications for p = 1, v = jjt, k = y and 



from Table Jlji leads to [Q 1 



(i_ y y , which when applied to §65f (and using values 

for k + I > 2, 



fe,l 



k+l-l 7 

which explains the term ^4 in ( f39] >. Similarly, Q 2 follows from 
collecting fourth-order terms, 



[Q 



2\k,l 



Bk 



fc+Z,3 



vA 



0. 

l k,l ) 
k,l,3, 



k + l<l, 
k + l = 2, 
k + l>3, 



(69) 



the last term of which when simplified yields [Q 2 



fc,/ 



-fc/ 



(l-2/) ; 



for fc + I > 3, that explains the term £5 in ((39 



The derivation of the CRLB in Equation ( |40[ > follows from the 
block inversion of J and subsequent application of Lemma [T] 



b' 
C 



b'[C-±bb'] 1 b 



a J 



rn 



M 



say. 



(70) 



From ( [38) , we have a 
„ 1 



(at 2 , 
1 

JV2 



and 



where subscript P is dropped from Hp,ep and / P . Next 
we apply Lemma [T] to obtain the diagonal elements [M] q ^ q 

of block M . By substituting cq = ^4, c\ = ^5 c 2 = S,^ — 
1^2 ^ anc ^ observing the facts that [-D -1 ] p . p = - and 



ip_i, p _i = P 2 /p 2 , we obtain [V] PiP for p > 1 as in ([40j». 
For the top-left element, m, the second order term is 

, 2 



b'Mb 



Mi 1, but M11 



Jfp-i.o 



1 

iV 1 



which completes the derivation of (|40 
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